Multiprocessor Memory Reference Generation Using
نویسنده
چکیده
This paper presents Cerberus, an eÆcient system for simulating the execution of shared-memory multiprocessor programs on a uniprocessor workstation. Using EDS (execution driven simulation), it generates address traces which can be used to drive cache simulations on the y, eliminating the large disk space requirements needed by trace les. It is fast because it links the program to be traced together with the cache or statistics gathering tool into a single executable, which eliminates the context-switching needed by communicating processes. It is exible because it has a simple interface which allows users to easily add any kind of module to use the generated trace information. It compares favorably to other existing tracers and runs on a commonly available workstation. And it is accurate, allowing cycle-by-cycle interactions between the simulated processors. The resulting slowdown from Cerberus is approximately 31 in uniprocessor mode and 45{50 in multiprocessor mode relative to the workloads run natively on the same machines.We demonstrate that EDS uses only 5 percent of the total execution cycles when combined with a cache simulator and show that EDS is just as eÆcient as using trace driven simulation. The implementation details of Cerberus are provided here, along with a performance analysis of multiprocessor simulation in the Cerberus environment. Some of the other simulation and trace generation tools are surveyed, with the strengths and weaknesses of those tools discussed. yThe work presented here has been supported in part by the State of California under the MICRO program, Sun Microsystems, Toshiba Corporation, Fujitsu Microelectronics, Cirrus Corporation, Microsoft Corporation, Quantum Corporation, and Sony USA Research bLaboratories. Partial support was also provided by Siemens A.G., which supported Je Rothman during some of this work.
منابع مشابه
Multiprocessor Memory Hierarchies
parallel computer architecture; high performance system design; system bus; caches; memory hierarchies; shared memory machines Memory latency, bandwidth, and locality of reference will play larger roles in future parallel systems as processors speed up relative to main memory latency. Using an instruction level PA-RISe multiprocessor simulator, we examined hardware and software techniques that ...
متن کاملEffect of Hot-Spots on the Performance of Crossbar Multiprocessor Systems
Atiquzzaman, M. and M.M. Banat, Effect of hot-spots on the performance of crossbar multiprocessor systems, Parallel Computing 19 (1993) 455-461. Previous studies on the performance of crossbar multiprocessor systems have assumed a uniform memory reference model. Hot-spots arising in multiprocessor systems due to the use of shared variables, synchronization primitives, etc. give rise to nonunifo...
متن کاملA Hybrid Approach to Trace Generation for Performance Evaluation of Shared-Bus Multiprocessors
This paper describes a hybrid methodology (based on both actual and synthetic reference streams) to produce traces representing significant complete workloads. By means of a software approach, we generate traces that include both user and kernel references, starting from source traces containing only user references. We consider the aspects of kernel that have a deeper impact on the multiproces...
متن کاملPii: S0045-7906(98)00028-7
Performance evaluation of multiple-bus multiprocessor systems is usually carried out under the assumption of uniform memory reference model. Hot spots arising in multiprocessor systems due to the use of shared variables, synchronization primitives, etc. give rise to non-uniform memory reference pattern. The objective of this paper is to study the performance of multiple bus multiprocessor syste...
متن کاملA Workload Generation Environment for Trace-Driven Simulation of Shared-Bus Multiprocessors
We describe an environment to produce traces representing significant workloads for a shared-bus shared-memory multiprocessor used as a general-purpose multitasking machine, where each processor can include multithread facilities. By means of an exclusively software approach, the environment produces traces that include both user and kernel references, starting from source traces containing onl...
متن کامل